Dependency Parsing of Turkish
نویسندگان
چکیده
The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative, free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical units called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We test our claim on two different parsing methods, one based on a probabilistic model with beam search and the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of the parsing method. We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملStatistical Dependency Parsing for Turkish
This paper presents results from the first statistical dependency parser for Turkish. Turkish is a free-constituent order language with complex agglutinative inflectional and derivational morphology and presents interesting challenges for statistical parsing, as in general, dependency relations are between “portions” of words – called inflectional groups. We have explored statistical models tha...
متن کاملTurkish Treebank as a Gold Standard for Morphological Disambiguation and Its Influence on Parsing
So far predicted scenarios for Turkish dependency parsing have used a morphological disambiguator that is trained on the data distributed with the tool(Sak et al., 2008). Although models trained on this data have high accuracy scores on the test and development data of the same set, the accuracy drastically drops when the model is used in the preprocessing of Turkish Treebank parsing experiment...
متن کاملTowards Joint Morphological Analysis and Dependency Parsing of Turkish
Turkish is an agglutinative language with rich morphology-syntax interactions. As an extension of this property, the Turkish Treebank is designed to represent sublexical dependencies, which brings extra challenges to parsing raw text. In this work, we use a joint POS tagging and parsing approach to parse Turkish raw text, and we show it outperforms a pipeline approach. Then we experiment with i...
متن کاملInitial Explorations in Two-phase Turkish Dependency Parsing by Incorporating Constituents
This paper describes a two-phase Turkish dependency parsing which separates dependency and labeling into two similar to (McDonald et al., 2006b). First, in order to solve the long distance dependency attachment problem, the sentences are split into constituents and the dependencies are estimated on shorter sentences. Later, for better estimation of labels, Conditional Random Fields (CRFs) are u...
متن کاملThe Impact of Automatic Morphological Analysis & Disambiguation on Dependency Parsing of Turkish
The studies on dependency parsing of Turkish so far gave their results on the Turkish Dependency Treebank. This treebank consists of gold standard sentences where part-of-speech tags are manually assigned to each word and the words forming multi word expressions are also manually determined and combined into single units. For the first time, we investigate the results of parsing Turkish sentenc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Linguistics
دوره 34 شماره
صفحات -
تاریخ انتشار 2008